Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
1.
G3 (Bethesda) ; 2024 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-38427916

RESUMEN

Tanoak (Notholithocarpus densiflorus) is an evergreen tree in the Fagaceae family found in California and southern Oregon. Historically, tanoak acorns were an important food source for Native American tribes and the bark was used extensively in the leather tanning process. Long considered a disjunct relictual element of the Asian stone oaks (Lithocarpus spp.), phylogenetic analysis has determined that the tanoak is an example of convergent evolution. Tanoaks are deeply divergent from oaks (Quercus) of the Pacific Northwest and comprise a new genus with a single species. These trees are highly susceptible to 'sudden oak death' (SOD), a plant pathogen (Phytophthora ramorum) that has caused widespread mortality of tanoaks. Here, we set out to assemble the genome and perform comparative studies among a number of individuals that demonstrated varying levels of susceptibility to SOD. First, we sequenced and de novo assembled a draft reference genome of N. densiflorus using co-barcoded library processing methods and an MGI DNBSEQ-G400 sequencer. To increase the contiguity of the final assembly, we also sequenced Oxford Nanopore (ONT) long reads to 30X coverage. To our knowledge, the draft genome reported here is one of the more contiguous and complete genomes of a tree species published to date, with a contig N50 of ∼1.2 Mb, a scaffold N50 of ∼2.1 Mb, and a complete gene score of 95.5% through BUSCO analysis. In addition, we sequenced 11 genetically distinct individuals and mapped these onto the draft reference genome enabling the discovery of almost 25 million single nucleotide polymorphisms and ∼4.4 million small insertions and deletions. Finally, using co-barcoded data we were able to generate complete haplotype coverage of all 11 genomes.

2.
Cell Rep Methods ; 3(3): 100437, 2023 03 27.
Artículo en Inglés | MEDLINE | ID: mdl-37056375

RESUMEN

Sequencing of hypervariable regions as well as internal transcribed spacer regions of ribosomal RNA genes (rDNA) is broadly used to identify bacteria and fungi, but taxonomic and phylogenetic resolution is hampered by insufficient sequencing length using high throughput, cost-efficient second-generation sequencing. We developed a method to obtain nearly full-length rDNA by assembling single DNA molecules combining DNA co-barcoding with single-tube long fragment read technology and second-generation sequencing. Benchmarking was performed using mock bacterial and fungal communities as well as two forest soil samples. All mock species rDNA were successfully recovered with identities above 99.5% compared to the reference sequences. From the soil samples we obtained good coverage with identification of more than 20,000 unknown species, as well as high abundance correlation between replicates. This approach provides a cost-effective method for obtaining extensive and accurate information on complex environmental microbial communities.


Asunto(s)
Eucariontes , Microbiota , Filogenia , Eucariontes/genética , Genes de ARNr , Análisis de Secuencia de ADN/métodos , ARN Ribosómico/genética , Bacterias/genética , Microbiota/genética , ADN Ribosómico/genética , Suelo
3.
Methods Mol Biol ; 2590: 71-84, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36335493

RESUMEN

In this chapter, we describe how Long Fragment Read (LFR) technology can be applied to samples consisting of very few cells (5-20) to enable complete genome sequencing and haplotyping with a very low false positive error rate. LFR is a method for processing DNA or cells prior to sequencing on any second-generation DNA sequencing platform (e.g., MGI's DNBSEQ, Illumina sequencers, etc.). First, the LFR process incorporates a low-bias whole genome amplification step allowing accurate sequencing from very low DNA inputs (as low as 32 picograms, the mass contained within 5 diploid human cells). In addition, LFR enables the haplotyping of nearly all genomic variations with N50 contig lengths up to ~1 Mb. Furthermore, if data from this method are analyzed with parental genotype data, it is possible to generate phased variants in uninterrupted contigs spanning entire chromosomes. Importantly, the barcoding process utilized in this method allows for the detection and correction of most amplification, sequencing, and mapping errors, yielding false positive error rates as low as 10-9. Finally, the cost of this method is modest and enables extremely high-quality whole genome sequence and haplotype data from as few as 5 cells. We know of few other methods that can achieve this.


Asunto(s)
Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Haplotipos/genética , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ADN , Tecnología
4.
Methods Mol Biol ; 2590: 59-70, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36335492

RESUMEN

In this chapter, we describe a simple, low-cost method for making many copies of a single DNA molecule (1-10 kb in length) as a concatemer on a long DNA strand. This can enable applications requiring high-quality contiguous sequence and haplotype data from long single DNA molecules at large scale.


Asunto(s)
ADN , Secuenciación de Nucleótidos de Alto Rendimiento , Haplotipos/genética , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ADN/genética
5.
Methods Mol Biol ; 2590: 101-125, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36335495

RESUMEN

In this chapter, we describe single-tube long fragment read (stLFR), a simple preparation method for whole-genome sequencing and physical haplotyping based on the DNA co-barcoding strategy. Similar to LFR, stLFR applies the concept of adding the same barcode to subfragments derived from the same long DNA molecule. However, instead of a 384-well plate, stLFR uses the surface of micron-sized magnetic beads to create millions of virtual compartments in a single reaction tube. This is enabled by a split and pool barcoded bead preparation process capable of generating ~500,000 copies of the same unique barcode, from a library of ~3.6 billion unique barcodes, on each bead. The instruments and devices used in the stLFR process are easily accessible in nearly all standard molecular biology laboratories, and the cost of reagents can be as low as 30 dollars per sample. stLFR libraries can be sequenced by standard second-generation sequencing instruments (e.g., MGI or Illumina devices), and the barcode sharing information enables detection and phasing of all variations, including large structural variations. In addition, stLFR data can be used to scaffold contigs and de novo assemble genomes.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Análisis Costo-Beneficio , Haplotipos , Secuenciación Completa del Genoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Biblioteca de Genes , Análisis de Secuencia de ADN
6.
Front Med (Lausanne) ; 8: 654696, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34164412

RESUMEN

Early detection and treatment of visual impairment diseases are critical and integral to combating avoidable blindness. To enable this, artificial intelligence-based disease identification approaches are vital for visual impairment diseases, especially for people living in areas with a few ophthalmologists. In this study, we demonstrated the identification of a large variety of visual impairment diseases using a coarse-to-fine approach. We designed a hierarchical deep learning network, which is composed of a family of multi-task & multi-label learning classifiers representing different levels of eye diseases derived from a predefined hierarchical eye disease taxonomy. A multi-level disease-guided loss function was proposed to learn the fine-grained variability of eye disease features. The proposed framework was trained for both ocular surface and retinal images, independently. The training dataset comprised 7,100 clinical images from 1,600 patients with 100 diseases. To show the feasibility of the proposed framework, we demonstrated eye disease identification on the first two levels of the eye disease taxonomy, namely 7 ocular diseases with 4 ocular surface diseases and 3 retinal fundus diseases in level 1 and 17 subclasses with 9 ocular surface diseases and 8 retinal fundus diseases in level 2. The proposed framework is flexible and extensible, which can be inherently trained on more levels with sufficient training data for each subtype diseases (e.g., the 17 classes of level 2 include 100 subtype diseases defined as level 3 diseases). The performance of the proposed framework was evaluated against 40 board-certified ophthalmologists on clinical cases with various visual impairment diseases and showed that the proposed framework had high sensitivity and specificity with the area under the receiver operating characteristic curve ranging from 0.743 to 0.989 in identifying all identified major causes of blindness. Further assessment of 4,670 cases in a tertiary eye center also demonstrated that the proposed framework achieved a high identification accuracy rate for different visual impairment diseases compared with that of human graders in a clinical setting. The proposed hierarchical deep learning framework would improve clinical practice in ophthalmology and broaden the scope of service available, especially for people living in areas with a few ophthalmologists.

7.
Bioinformatics ; 37(15): 2095-2102, 2021 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-33538292

RESUMEN

MOTIVATION: Achieving a near complete understanding of how the genome of an individual affects the phenotypes of that individual requires deciphering the order of variations along homologous chromosomes in species with diploid genomes. However, true diploid assembly of long-range haplotypes remains challenging. RESULTS: To address this, we have developed Haplotype-resolved Assembly for Synthetic long reads using a Trio-binning strategy, or HAST, which uses parental information to classify reads into maternal or paternal. Once sorted, these reads are used to independently de novo assemble the parent-specific haplotypes. We applied HAST to cobarcoded second-generation sequencing data from an Asian individual, resulting in a haplotype assembly covering 94.7% of the reference genome with a scaffold N50 longer than 11 Mb. The high haplotyping precision (∼99.7%) and recall (∼95.9%) represents a substantial improvement over the commonly used tool for assembling cobarcoded reads (Supernova), and is comparable to a trio-binning-based third generation long-read-based assembly method (TrioCanu) but with a significantly higher single-base accuracy [up to 99.99997% (Q65)]. This makes HAST a superior tool for accurate haplotyping and future haplotype-based studies. AVAILABILITY AND IMPLEMENTATION: The code of the analysis is available at https://github.com/BGI-Qingdao/HAST. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

8.
Gigascience ; 9(12)2020 12 21.
Artículo en Inglés | MEDLINE | ID: mdl-33347571

RESUMEN

BACKGROUND: Sequencing technologies have advanced to the point where it is possible to generate high-accuracy, haplotype-resolved, chromosome-scale assemblies. Several long-read sequencing technologies are available, and a growing number of algorithms have been developed to assemble the reads generated by those technologies. When starting a new genome project, it is therefore challenging to select the most cost-effective sequencing technology, as well as the most appropriate software for assembly and polishing. It is thus important to benchmark different approaches applied to the same sample. RESULTS: Here, we report a comparison of 3 long-read sequencing technologies applied to the de novo assembly of a plant genome, Macadamia jansenii. We have generated sequencing data using Pacific Biosciences (Sequel I), Oxford Nanopore Technologies (PromethION), and BGI (single-tube Long Fragment Read) technologies for the same sample. Several assemblers were benchmarked in the assembly of Pacific Biosciences and Nanopore reads. Results obtained from combining long-read technologies or short-read and long-read technologies are also presented. The assemblies were compared for contiguity, base accuracy, and completeness, as well as sequencing costs and DNA material requirements. CONCLUSIONS: The 3 long-read technologies produced highly contiguous and complete genome assemblies of M. jansenii. At the time of sequencing, the cost associated with each method was significantly different, but continuous improvements in technologies have resulted in greater accuracy, increased throughput, and reduced costs. We propose updating this comparison regularly with reports on significant iterations of the sequencing technologies.


Asunto(s)
Genoma Bacteriano , Secuenciación de Nucleótidos de Alto Rendimiento , Genoma de Planta , Análisis de Secuencia de ADN , Programas Informáticos
9.
Gigascience ; 9(9)2020 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-32893860

RESUMEN

BACKGROUND: Analyses that use genome assemblies are critically affected by the contiguity, completeness, and accuracy of those assemblies. In recent years single-molecule sequencing techniques generating long-read information have become available and enabled substantial improvement in contig length and genome completeness, especially for large genomes (>100 Mb), although bioinformatic tools for these applications are still limited. FINDINGS: We developed a software tool to close sequence gaps in genome assemblies, TGS-GapCloser, that uses low-depth (∼10×) long single-molecule reads. The algorithm extracts reads that bridge gap regions between 2 contigs within a scaffold, error corrects only the candidate reads, and assigns the best sequence data to each gap. As a demonstration, we used TGS-GapCloser to improve the scaftig NG50 value of 3 human genome assemblies by 24-fold on average with only ∼10× coverage of Oxford Nanopore or Pacific Biosciences reads, covering with sequence data up to 94.8% gaps with 97.7% positive predictive value. These improved assemblies achieve 99.998% (Q46) single-base accuracy with final inserted sequences having 99.97% (Q35) accuracy, despite the high raw error rate of single-molecule reads, enabling high-quality downstream analyses, including up to a 31-fold increase in the scaftig NGA50 and up to 13.1% more complete BUSCO genes. Additionally, we show that even in ultra-large genome assemblies, such as the ginkgo (∼12 Gb), TGS-GapCloser can cover 71.6% of gaps with sequence data. CONCLUSIONS: TGS-GapCloser can close gaps in large genome assemblies using raw long reads quickly and cost-effectively. The final assemblies generated by TGS-GapCloser have improved contiguity and completeness while maintaining high accuracy. The software is available at https://github.com/BGI-Qingdao/TGS-GapCloser.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Biología Computacional , Genoma Humano , Humanos , Análisis de Secuencia de ADN
10.
PeerJ ; 8: e8431, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32231869

RESUMEN

Recent advances in long fragment read (LFR, also known as linked-read technologies or read-cloud) technologies, such as single tube long fragment reads (stLFR), 10X Genomics Chromium reads, and TruSeq synthetic long-reads, have enabled efficient haplotyping and genome assembly. However, in the case of stLFR and 10X Genomics Chromium reads, the long fragments of a genome are covered sparsely by reads in each barcode and most barcodes are contained in multiple long fragments from different regions, which results in inefficient assembly when using long-range information. Thus, methods to address these shortcomings are vital for capitalizing on the additional information obtained using these technologies. We therefore designed IterCluster, a novel, alignment-free clustering algorithm that can cluster barcodes from the same target region of a genome, using -mer frequency-based features and a Markov Cluster (MCL) approach to identify enough reads in a target region of a genome to ensure sufficient target genome sequence depth. The IterCluster method was validated using BGI stLFR and 10X Genomics chromium reads datasets. IterCluster had a higher precision and recall rate on BGI stLFR data compared to 10X Genomics Chromium read data. In addition, we demonstrated how IterCluster improves the de novo assembly results when using a divide-and-conquer strategy on a human genome data set (scaffold/contig N50 = 13.2 kbp/7.1 kbp vs. 17.1 kbp/11.9 kbp before and after IterCluster, respectively). IterCluster provides a new way for determining LFR barcode enrichment and a novel approach for de novo assembly using LFR data. IterCluster is OpenSource and available on https://github.com/JianCong-WENG/IterCluster.

11.
Cell Stem Cell ; 25(5): 697-712.e6, 2019 Nov 07.
Artículo en Inglés | MEDLINE | ID: mdl-31588047

RESUMEN

To investigate the contribution of parental genomes to early embryogenesis, we profiled the single-cell transcriptomes of human biparental and uniparental embryos systematically from the 1-cell to the morula stage. We observed that uniparental embryos exhibited variable and decreased embryonic genome activation (EGA). Comparative transcriptome analysis identified 807 maternally biased expressed genes (MBGs) and 581 paternally biased expressed genes (PBGs) in the preimplantation stages. MBGs became apparent at the 4-cell stage and contributed to the initiation of EGA, whereas PBGs preferentially appeared at the 8-cell stage and might affect embryo compaction and trophectoderm specification. Regulatory network analysis revealed that DUX4, EGR2, and DUXA are key transcription factors in MBGs' expression; ZNF263 and KLF3 are important for PBGs' expression. We demonstrated that parent-specific DNA methylation might account for the expression of most PBGs. Our results provide a valuable resource to understand parental genome activation and might help to elucidate parent-of-origin effects in early human development.


Asunto(s)
Blastocisto/metabolismo , Embrión de Mamíferos/metabolismo , Desarrollo Embrionario/genética , Transcriptoma/genética , Metilación de ADN , Proteínas de Unión al ADN/metabolismo , Proteína 2 de la Respuesta de Crecimiento Precoz/metabolismo , Femenino , Perfilación de la Expresión Génica , Regulación del Desarrollo de la Expresión Génica/genética , Ontología de Genes , Proteínas de Homeodominio/metabolismo , Humanos , Factores de Transcripción de Tipo Kruppel/metabolismo , Oocitos/metabolismo , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , RNA-Seq , Secuencias Repetitivas de Ácidos Nucleicos/genética , Análisis de la Célula Individual , Imagen de Lapso de Tiempo
12.
Sci Data ; 6(1): 65, 2019 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-31110271

RESUMEN

The Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) is a fundamental epigenomics approach and has been widely used in profiling the chromatin accessibility dynamics in multiple species. A comprehensive reference of ATAC-seq datasets for mammalian tissues is important for the understanding of regulatory specificity and developmental abnormality caused by genetic or environmental alterations. Here, we report an adult mouse ATAC-seq atlas by producing a total of 66 ATAC-seq profiles from 20 primary tissues of both male and female mice. The ATAC-seq read enrichment, fragment size distribution, and reproducibility between replicates demonstrated the high quality of the full dataset. We identified a total of 296,574 accessible elements, of which 26,916 showed tissue-specific accessibility. Further, we identified key transcription factors specific to distinct tissues and found that the enrichment of each motif reflects the developmental similarities across tissues. In summary, our study provides an important resource on the mouse epigenome and will be of great importance to various scientific disciplines such as development, cell reprogramming, and genetic disease.


Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Cromatina/genética , Epigenómica , Animales , Femenino , Masculino , Ratones , Ratones Endogámicos C57BL , Reproducibilidad de los Resultados , Factores de Transcripción/genética , Transposasas
13.
Genome Res ; 29(5): 798-808, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-30940689

RESUMEN

Here, we describe single-tube long fragment read (stLFR), a technology that enables sequencing of data from long DNA molecules using economical second-generation sequencing technology. It is based on adding the same barcode sequence to subfragments of the original long DNA molecule (DNA cobarcoding). To achieve this efficiently, stLFR uses the surface of microbeads to create millions of miniaturized barcoding reactions in a single tube. Using a combinatorial process, up to 3.6 billion unique barcode sequences were generated on beads, enabling practically nonredundant cobarcoding with 50 million barcodes per sample. Using stLFR, we demonstrate efficient unique cobarcoding of more than 8 million 20- to 300-kb genomic DNA fragments. Analysis of the human genome NA12878 with stLFR demonstrated high-quality variant calling and phase block lengths up to N50 34 Mb. We also demonstrate detection of complex structural variants and complete diploid de novo assembly of NA12878. These analyses were all performed using single stLFR libraries, and their construction did not significantly add to the time or cost of whole-genome sequencing (WGS) library preparation. stLFR represents an easily automatable solution that enables high-quality sequencing, phasing, SV detection, scaffolding, cost-effective diploid de novo genome assembly, and other long DNA sequencing applications.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación Completa del Genoma/métodos , Análisis Costo-Beneficio , Diploidia , Biblioteca de Genes , Genoma Humano , Genómica , Haplotipos/genética , Secuenciación de Nucleótidos de Alto Rendimiento/economía , Humanos , Secuenciación Completa del Genoma/economía
14.
Nucleic Acids Res ; 47(6): 2981-2995, 2019 04 08.
Artículo en Inglés | MEDLINE | ID: mdl-30698752

RESUMEN

To fully understand human genetic variation and its functional consequences, the specific distribution of variants between the two chromosomal homologues of genes must be known. The 'phase' of variants can significantly impact gene function and phenotype. To assess patterns of phase at large scale, we have analyzed 18 121 autosomal genes in 1092 statistically phased genomes from the 1000 Genomes Project and 184 experimentally phased genomes from the Personal Genome Project. Here we show that genes with cis-configurations of coding variants are more frequent than genes with trans-configurations in a genome, with global cis/trans ratios of ∼60:40. Significant cis-abundance was observed in virtually all genomes in all populations. Moreover, we identified a large group of genes exhibiting cis-configurations of protein-changing variants in excess, so-called 'cis-abundant genes', and a smaller group of 'trans-abundant genes'. These two gene categories were functionally distinguishable, and exhibited strikingly different distributional patterns of protein-changing variants. Underlying these phenomena was a shared set of phase-sensitive genes of importance for adaptation and evolution. This work establishes common patterns of phase as key characteristics of diploid human exomes and provides evidence for their functional significance, highlighting the importance of phase for the interpretation of protein-coding genetic variation and gene function.


Asunto(s)
Diploidia , Genoma Humano/genética , Sistemas de Lectura Abierta/genética , Sitios de Carácter Cuantitativo/genética , Exoma/genética , Variación Genética , Haplotipos/genética , Humanos , Polimorfismo de Nucleótido Simple/genética
15.
DNA Res ; 26(1): 45-53, 2019 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-30428014

RESUMEN

Nucleic acid ligases are crucial enzymes that repair breaks in DNA or RNA during synthesis, repair and recombination. Various genomic tools have been developed using the diverse activities of DNA/RNA ligases. Herein, we demonstrate a non-conventional ability of T4 DNA ligase to insert 5' phosphorylated blunt-end double-stranded DNA to DNA breaks at 3'-recessive ends, gaps, or nicks to form a Y-shaped 3'-branch structure. Therefore, this base pairing-independent ligation is termed 3'-branch ligation (3'BL). In an extensive study of optimal ligation conditions, the presence of 10% PEG-8000 in the ligation buffer significantly increased ligation efficiency to more than 80%. Ligation efficiency was slightly varied between different donor and acceptor sequences. More interestingly, we discovered that T4 DNA ligase efficiently ligated DNA to the 3'-recessed end of RNA, not to that of DNA, in a DNA/RNA hybrid, suggesting a ternary complex formation preference of T4 DNA ligase. These novel properties of T4 DNA ligase can be utilized as a broad molecular technique in many important genomic applications, such as 3'-end labelling by adding a universal sequence; directional tagmentation for NGS library construction that achieve theoretical 100% template usage; and targeted RNA NGS libraries with mitigated structure-based bias and adapter dimer problems.


Asunto(s)
ADN Ligasas/metabolismo , ADN/metabolismo , Ingeniería Genética/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ARN/metabolismo , Humanos
16.
Genet Med ; 20(5): 495-502, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29758565

RESUMEN

PurposeWe describe a novel syndrome in seven female patients with extreme developmental delay and neoteny.MethodsAll patients in this study were female, aged 4 to 23 years, were well below the fifth percentile in height and weight, had failed to develop sexually, and lacked the use of language. Karyotype and array chromosome genomic hybridization analysis failed to identify large-scale structural variations. To further understand the underlying cause of disease in these patients, whole-genome sequencing was performed.ResultsIn five patients, coding de novo mutations (DNMs) were found in five different genes. These genes fell into similar functional categories of transcription regulation and chromatin modification. Comparison to a control population suggested that individuals with neotenic complex syndrome (NCS)-a name that we propose herein-could have an excess of rare inherited variants in genes associated with developmental delay and autism, although the difference was not significant.ConclusionWe describe an extreme form of developmental delay, with the defining characteristic of neoteny. In most patients we identified coding DNMs in a set of genes intolerant of haploinsufficiency; however, it is not clear whether these contributed to NCS. Rare inherited variants may also be associated with NCS, but more samples need to be analyzed to achieve statistical significance.


Asunto(s)
Anomalías Múltiples/diagnóstico , Anomalías Múltiples/genética , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Pruebas Genéticas , Fenotipo , Adolescente , Adulto , Alelos , Sustitución de Aminoácidos , Niño , Preescolar , Facies , Femenino , Frecuencia de los Genes , Pruebas Genéticas/métodos , Genotipo , Humanos , Masculino , Síndrome , Secuenciación Completa del Genoma , Adulto Joven
17.
Clin Chem ; 64(4): 715-725, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29545257

RESUMEN

BACKGROUND: Amniocentesis is a common procedure, the primary purpose of which is to collect cells from the fetus to allow testing for abnormal chromosomes, altered chromosomal copy number, or a small number of genes that have small single- to multibase defects. Here we demonstrate the feasibility of generating an accurate whole-genome sequence of a fetus from either the cellular or cell-free DNA (cfDNA) of an amniotic sample. METHODS: cfDNA and DNA isolated from the cell pellet of 31 amniocenteses were sequenced to approximately 50× genome coverage by use of the Complete Genomics nanoarray platform. In a subset of the samples, long fragment read libraries were generated from DNA isolated from cells and sequenced to approximately 100× genome coverage. RESULTS: Concordance of variant calls between the 2 DNA sources and with parental libraries was >96%. Two fetal genomes were found to harbor potentially detrimental variants in chromodomain helicase DNA binding protein 8 (CHD8) and LDL receptor-related protein 1 (LRP1), variations of which have been associated with autism spectrum disorder and keratosis pilaris atrophicans, respectively. We also discovered drug sensitivities and carrier information of fetuses for a variety of diseases. CONCLUSIONS: We were able to elucidate the complete genome sequence of 31 fetuses from amniotic fluid and demonstrate that the cfDNA or DNA from the cell pellet can be analyzed with little difference in quality. We believe that current technologies could analyze this material in a highly accurate and complete manner and that analyses like these should be considered for addition to current amniocentesis procedures.


Asunto(s)
Líquido Amniótico/metabolismo , Feto/metabolismo , Genoma Humano , Secuenciación Completa del Genoma , Anomalías Múltiples/genética , Adulto , Amniocentesis , Trastorno del Espectro Autista/genética , Estudios de Cohortes , Variaciones en el Número de Copia de ADN , Enfermedad de Darier/genética , Cejas/anomalías , Estudios de Factibilidad , Femenino , Predisposición Genética a la Enfermedad , Humanos , Masculino , Mutación
18.
Hum Genomics ; 11(1): 30, 2017 Dec 08.
Artículo en Inglés | MEDLINE | ID: mdl-29216901

RESUMEN

BACKGROUND: Amyotrophic lateral sclerosis (ALS) is a devastating disease whose complex pathology has been associated with a strong genetic component in the context of both familial and sporadic disease. Herein, we adopted a next-generation sequencing approach to Greek patients suffering from sporadic ALS (together with their healthy counterparts) in order to explore further the genetic basis of sporadic ALS (sALS). RESULTS: Whole-genome sequencing analysis of Greek sALS patients revealed a positive association between FTO and TBC1D1 gene variants and sALS. Further, linkage disequilibrium analyses were suggestive of a specific disease-associated haplotype for FTO gene variants. Genotyping for these variants was performed in Greek, Sardinian, and Turkish sALS patients. A lack of association between FTO and TBC1D1 variants and sALS in patients of Sardinian and Turkish descent may suggest a founder effect in the Greek population. FTO was found to be highly expressed in motor neurons, while in silico analyses predicted an impact on FTO and TBC1D1 mRNA splicing for the genomic variants in question. CONCLUSIONS: To our knowledge, this is the first study to present a possible association between FTO gene variants and the genetic etiology of sALS. In addition, the next-generation sequencing-based genomics approach coupled with the two-step validation strategy described herein has the potential to be applied to other types of human complex genetic disorders in order to identify variants of clinical significance.


Asunto(s)
Dioxigenasa FTO Dependiente de Alfa-Cetoglutarato/genética , Esclerosis Amiotrófica Lateral/genética , Dioxigenasa FTO Dependiente de Alfa-Cetoglutarato/metabolismo , Estudios de Casos y Controles , Simulación por Computador , Efecto Fundador , Proteínas Activadoras de GTPasa/genética , Grecia , Haplotipos , Humanos , Desequilibrio de Ligamiento , Neuronas Motoras/patología , Neuronas Motoras/fisiología , Polimorfismo de Nucleótido Simple
19.
20.
Cancer Res ; 77(16): 4530-4541, 2017 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-28811315

RESUMEN

Much effort has been dedicated to developing circulating tumor cells (CTC) as a noninvasive cancer biopsy, but with limited success as yet. In this study, we combine a method for isolation of highly pure CTCs using immunomagnetic enrichment/fluorescence-activated cell sorting with advanced whole genome sequencing (WGS), based on long fragment read technology, to illustrate the utility of an accurate, comprehensive, phased, and quantitative genomic analysis platform for CTCs. Whole genomes of 34 CTCs from a patient with metastatic breast cancer were analyzed as 3,072 barcoded subgenomic compartments of long DNA. WGS resulted in a read coverage of 23× per cell and an ensemble call rate of >95%. These barcoded reads enabled accurate detection of somatic mutations present in as few as 12% of CTCs. We found in CTCs a total of 2,766 somatic single-nucleotide variants and 543 indels and multi-base substitutions, 23 of which altered amino acid sequences. Another 16,961 somatic single nucleotide variant and 8,408 indels and multi-base substitutions, 77 of which were nonsynonymous, were detected with varying degrees of prevalence across the 34 CTCs. On the basis of our whole genome data of mutations found in all CTCs, we identified driver mutations and the tissue of origin of these cells, suggesting personalized combination therapies beyond the scope of most gene panels. Taken together, our results show how advanced WGS of CTCs can lead to high-resolution analyses of cancers that can reliably guide personalized therapy. Cancer Res; 77(16); 4530-41. ©2017 AACR.


Asunto(s)
Genómica/métodos , Neoplasias/tratamiento farmacológico , Células Neoplásicas Circulantes/metabolismo , Femenino , Humanos , Persona de Mediana Edad , Metástasis de la Neoplasia , Células Neoplásicas Circulantes/patología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...